An Evaluation of Discretization Methods for Learning Rules from Biomedical Datasets

نویسندگان

  • Jonathan L. Lustgarten
  • Shyam Visweswaran
  • Himanshu Grover
  • Vanathi Gopalakrishnan
چکیده

Rule learning has the major advantage of understandability by human experts when performing knowledge discovery within the biomedical domain. Many rule learning algorithms require discrete data in order to learn the IF-THEN rule sets. This requirement makes the selection of a discretization technique an important step in rule learning. We compare the performance of one standard technique, Fayyad and Irani’s Minimum Description Length Principle Criterion, which is the defacto discretization method in many machine learning packages, to that of a new Efficient Bayesian Discretization (EBD) method and show that EBD leads to significant gains in performance especially as the complexity of the rule learner increases.

برای دانلود متن کامل این مقاله و بیش از 32 میلیون مقاله دیگر ابتدا ثبت نام کنید

ثبت نام

اگر عضو سایت هستید لطفا وارد حساب کاربری خود شوید

منابع مشابه

An Evolutionary Multi-objective Discretization based on Normalized Cut

Learning models and related results depend on the quality of the input data. If raw data is not properly cleaned and structured, the results are tending to be incorrect. Therefore, discretization as one of the preprocessing techniques plays an important role in learning processes. The most important challenge in the discretization process is to reduce the number of features’ values. This operat...

متن کامل

Analyzing Data Clusters: A Rough Sets Approach to Extract Cluster-Defining Symbolic Rules

In this paper we present a strategy together with its computational implementation to intelligently analyze data clusters in terms of symbolic cluster-defining rules. We present a symbolic rule extraction workbench that leverages rough set theory to inductively extract CNF form symbolic rules from un-annotated continuous-valued data-vectors. Our workbench purports a hybrid rule extraction metho...

متن کامل

A hybrid filter-based feature selection method via hesitant fuzzy and rough sets concepts

High dimensional microarray datasets are difficult to classify since they have many features with small number ofinstances and imbalanced distribution of classes. This paper proposes a filter-based feature selection method to improvethe classification performance of microarray datasets by selecting the significant features. Combining the concepts ofrough sets, weighted rough set, fuzzy rough se...

متن کامل

Hybrid System based on Rough Sets and Genetic Algorithms for Medical Data Classifications

Computational intelligence provides the biomedical domain by a significant support. The application of machine learning techniques in medical applications have been evolved from the physician needs. Screening, medical images, pattern classification, prognosis are some examples of health care support systems. Typically medical data has its own characteristics such as huge in size and features, c...

متن کامل

Experimental Evaluation of Discretization Schemes for Rule Induction

This paper proposes an experimental evaluation of various discretization schemes in three different evolutionary systems for inductive concept learning. The various discretization methods are used in order to obtain a number of discretization intervals, which represent the basis for the methods adopted by the systems for dealing with numerical values. Basically, for each rule and attribute, one...

متن کامل

ذخیره در منابع من


  با ذخیره ی این منبع در منابع من، دسترسی به آن را برای استفاده های بعدی آسان تر کنید

برای دانلود متن کامل این مقاله و بیش از 32 میلیون مقاله دیگر ابتدا ثبت نام کنید

ثبت نام

اگر عضو سایت هستید لطفا وارد حساب کاربری خود شوید

عنوان ژورنال:

دوره   شماره 

صفحات  -

تاریخ انتشار 2008